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A  General  Iterative  Regularization  Framework  For 

Image  Denoising 

Michael  R.  Charest  Jrd,  Michael  Elad^,  Peyman  Milanfar^ 


Abstract — Many  existing  techniqnes  for  image  denoising  can 
be  expressed  in  terms  of  minimizing  a  particular  cost  function. 
We  address  the  problem  of  denoising  images  in  a  novel  way  by 
iteratively  refining  the  cost  function.  This  allows  us  some  control 
over  the  tradeoff  between  the  bias  and  variance  of  the  image 
estimate.  The  result  is  an  improvement  in  the  mean-squared 
error  as  well  as  the  visual  quality  of  the  estimate.  We  consider 
four  different  methods  of  updating  the  cost  function  and 
compare  and  contrast  them.  The  framework  presented  here 
is  extendable  to  a  very  large  class  of  image  denoising  and 
reconstruction  methods.  The  framework  is  also  easily  extendable 
to  deblurring  and  inversion  as  we  briefly  demonstrate.  The 
effectiveness  of  the  proposed  methods  is  illustrated  on  a  variety 
of  examples. 

Keywords:  regularization,  image  denoising,  iterative,  bias, 
variance 


1.  Introduction 
Consider  the  noisy  image  y  given  by 

y  =  x  +  v  (1) 


where  x  is  the  true  image  that  we  would  like  to  recover  and 
V  is  zero-mean  additive  white  noise  that  is  uncorrelated  to 
X  and  with  no  assumptions  made  on  its  distribution.  Note 
that  for  ease  of  notation  we  carry  out  all  of  our  analysis  with 
vectors  representing  1-D  signals,  though  the  treatment  is  valid 
in  multiple  dimensions. 

A  very  general  technique  for  estimating  x  from  the  noisy 
image  y  is  to  minimize  a  cost  function  of  the  form 


X  =  argminC(x,y). 

X 


(2) 


Some  specific  examples  of  this  are  when  (^(xjy)  = 

i7(x,y)  +  J(x)  and  H{x,y)  =  2||y  —  x|p.  The  estimate 
then  becomes 


X  =  arg  min 

X 


l|y-x|i 


(3) 


where  J(x)  is  a  convex  regularization  functional  such  as 
those  in  Table  I.  The  parameter  A  controls  the  amount  of 
regularization. 

For  the  regularization  term  corresponding  to  the  bilateral 
filter,  S”  is  a  matrix  shift  operator  and  W„  is  a  weight 
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Denosing  Technique 

J(x) 

Tikhonov 

2  ll^ll 

Total  Variation  [I],  [2] 

A!l|Vx|||i 

Bilateral  [3],  [4] 

t  [x  -  S"x] 

TABLE  I 

Various  denoising  techniques  and  their  associated 

REGULARIZATION  TERMS. 


matrix  where  the  weights  are  a  function  of  both  the  radiometric 
and  spatial  distances  between  pixels  in  a  local  neighborhood 
([3], [4]). 

Figure  1  is  an  example  of  the  denoising  ability  of  the 
bilateral  filter  on  an  image  with  added  white  Gaussian  noise. 
By  looking  at  the  estimate  residual  we  notice  that  we  have 
removed  some  of  the  high  frequency  content  of  the  image 
along  with  the  noise.  This  is  true  more  generally  with  other 
denoising  techniques  as  well.  In  this  paper,  we  now  turn  our 
attention  to  recovering  this  lost  detail. 


II.  Iterative  Regularization  Methods 

The  general  framework  that  we  present  here  seeks  to 
improve  our  image  estimate  by  iteratively  updating  the  cost 
function  of  our  choosing.  We  can  express  this  as 

Xfe  =  arg  min  Cfe  (x ,  y ) .  (4) 

X 

Various  manifestations  of  this  iterative  regularization  proce¬ 
dure  exist.  We  present  four  different  algorithms  for  performing 
the  cost  function  update.  Each  algorithm  seeks  to  extract  lost 
detail  from  the  the  residual  y  —  x^  in  a  unique  way. 

Using  the  operator  B{-)  to  denote  the  net  effect  of  the 
minimization  in  (4),  we  formulate  the  iterative  regularization 
methods  as 

1)  Xk+i=B(y  +  X;t=i  (y 

2)  xfe+i  =  B{y)  +  B  (E?=i  (y  -  X*)) , 

3)  Xfe+i  =  B{y)  +  B{y  -  x,),  and 

4)  ife+i  =  B{y)  +  Yjl=i  (^(y)  -  '8(ii)) 

=  {k+l)B{y)-Y.UB{x,). 

The  first  method  was  recently  proposed  in  [1]  by  Osher  et 
al.  We  propose  the  other  three  methods  as  alternatives,  where 
Method  (3)  is  a  generalization  of  Tukey’s  ’’twicing”  idea  [5]. 
Notice  that  it  is  evident,  from  the  above  definitions,  that  if 
B{-)  is  naively  regarded  as  a  linear  operator,  all  these  options 
are  equivalent. 
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Fig.  1.  (a)  Detail  of  the  original  ’Barbara’  image  (b)  ’Barbara’  with  added 

while  Gaussian  noise  of  variance  29.5  (PSNR=  33.43dB)  (c)  The  result  of 
minimizing  the  Bilateral  cost  function  for  the  noisy  image  (b)  (MSE=  19.30) 
(d)  The  residual  (b)-(c) 


y 


Fig.  2.  Method  1  Block  Diagram 


Osher’s  method  refines  the  more  general  cost  function 
argminx  y)  +  J(x)}  as 


Xfc+i  =  argmin{i/(x,y)  +  J(x)  -  (x,pfc)} 


(8) 


where  pg  =  0  and  p^+i  =  p^ 
are  subgradients  of  J(x)  at  0  and  Xfc+i  respectively.  The 
equivalence  of  this  form  and  (5)  can  be  verified  by  substituting 
i/(x,y)  =  ^lly  —  x||^  and  comparing  the  first  derivatives. 


B.  Method  2 

This  iterative  regularization  method  is  formulated  as: 


III.  Description  of  Iterative  Regularization 
Methods 

In  this  section  we  will  discuss  the  implementation  of  each  of 
the  iterative  regularization  algorithms  for  the  general  denoising 
cost  function  in  (3).  Additionally,  the  general  formulation  is 
presented. 


A.  Method  1:  Osher  et  al.’s  Method  [1] 


Xfc-l-l 


B{y)+B(j2{y 


(9) 


Xl  +Xfc 
Xi  +  argmin 

X 

Xl  +  arg  min 

X 


k 

^  (y  -  X*)  -  X 


l|vfc  -xf  +  J(x)| 


2 


The  work  of  Osher  et  al.  improves  the  estimate  that  results 
from  the  cost  function  in  (3)  via  the  following  algorithm, 
which  we  here  call  ’’Osher’s  Iterative  Regularization  Method” 

xfc+i  =  S  ^y  +  ^(y-x,)^  ,  (5) 

with  Xg  =  0. 

This  can  also  be  written  as 

Xfc+i  =  .g(y  +  Vfc) ,  (6) 

with  Xg  =  0,  vg  =  0,  and 


where  xi  and  are  the  same  as  in  Method  1.  We  illustrate 
this  method  in  Figure  3. 


vfc+i  =  Vfc  +  (y  -  xfc+i).  (7) 

The  quantity  can  be  interpreted  as  a  cumulative  sum  of  the 
image  residuals. 

We  note  that  the  sum  of  the  residuals  have  been  added  back 
to  the  noisy  image  and  processed  again.  The  intuition  here  is 
that  if,  at  each  iteration,  the  residual  contains  more  signal  than 
noise,  our  estimate  will  improve.  A  block  diagram  illustrating 
this  method  is  shown  in  Figure  2. 


Method  2  refines  the  more  general  cost  function 
argminx  {i7(x,  y)  +  J(x)}  as 


Xfe+i  =  Xl  +  argmin{iT(x,y  -  xi)  +  J(x)  -  (x,pfe_i)} 

X 

(11) 

with  pg  =  0  and  p^+i  =  p^  +  ^Mi^|x  =  Xfc+2  . 
The  equivalence  of  this  form  and  (10)  can  be  verified  by 
substituting  i7(x,y)  =  ^Ijy  —  x|p  and  comparing  the  first 
derivatives. 


C.  Method  3:  Iterative  "Twicing”  Regularization 

In  his  book  [5],  published  in  the  mid  1970’s,  Tukey  pre¬ 
sented  a  method  he  called  ’’twicing”  where  a  filtered  version 
of  the  data  residual  was  added  back  to  the  inital  estimate  xq 
as 

X  =  xo -f  S(y  -  xo).  (12) 

Tukey’s  original  motivation  for  this  was  to  provide  an  im¬ 
proved  method  for  data  fitting  that  would  go  beyond  a  direct 
fit  and  incorporate  additional  ’’roughness”  into  the  estimate 
in  a  controlled  way.  The  same  year,  motivated  by  this  idea, 
this  concept  was  used  by  Kaiser  and  Hamming  [6]  as  a  way 
of  sharpening  the  response  of  symmetric  FIR  linear  filters. 
Both  references  also  mentioned  the  possibility  of  iterating  this 
process.  Thus  we  here  call  the  iterated  version  of  Tukey’s 
Twicing,  ’’Iterative  Twicing  Regularization”  (ITR).  We  can 
express  this  as: 

Xfe+i  =  Xfe -I- ;B(y  -  Xfc) 

=  Xfe_i -l-.B(y  -  Xfe_i) -|-.8(y  -  Xfc) 


an  image  from  the  image  itself.  The  fourth  algorithm  for 
iterative  regularization  that  we  present  is  very  similar  in  spirit 
to  unsharp  masking.  We  call  this  method  ’’Iterative  Unsharp 
Regularization”  (lUR)  and  formulate  it  as: 

Xfe+i  =  xk  +  B{y)  -  B{xk) 

=  xfe_i  -f  {B{y)  -  B{ik-i))  +  {B{y)  -  B(xk)) 

k 

=  *1 -f  ^  (6(y)  -  .g(x,))  (15) 

where  xg  =  0  and  xi  =  -^(y)- 

We  illustrate  this  method  in  Figure  5. 


k 

=  Xi+^S(y-Xi)  (13) 

i=l 

where  xq  =  0  and  Xi  =  B{y).  The  same  idea  has  been 
used  in  the  machine  learning  community  ([9])  under  the  name 
L2Boost. 

A  block  diagram  illustrating  this  algorithm  is  shown  in 
Figure  4. 


y 


Fig.  4.  Method  3  Block  Diagram 

ITR  refines  the  more  general  cost  function 
argminx  {i7(x,  y) -I- J(x)}  as 

k 

Xfc+i  =xi  4-y'argmin{iT(x,  0)-f  J(x)  -  (x,pj_i)} 

i—1 

(14) 

with  po  =  =  (y -xi)  and 

Pfc+i  =  Pk  +  |x  =  xfc+i  ,  where  Xfc  = 

argminx  {i7(x,0) -I- J(x)  -  (x,pfc_i)}  and  Xq  =  0. 

The  equivalence  of  this  form  and  (13)  can  be  verified  by 
substituting  H{x,y)  =  i||y  —  x|p  and  comparing  the  first 
derivatives. 


Fig.  5.  Method  4  Block  Diagram 

lUR  refines  the  more  general  cost  function 
argminx  {i7(x,  y) -I- J(x)}  as 
k  k 

Xfe+I  =  xi+^Xi-^argmin{iF(x,xi)  +  J(x)  -  (x,pi_i)} 
2=1  2=1 

(16) 

with  Po  =  0  and  pfc+i  =  Pfc  +  ,  where 

Xfc  =  argminx  {iT(x,Xi)  -f  J(x)  -  (x,pfc_i)}  and  xq  =  0. 

The  equivalence  of  this  form  and  (15)  can  be  verified  by 
substituting  H{x,y)  =  ^Ijy  —  x|p  and  comparing  the  first 
derivatives. 

As  a  sidenote,  we  mention  here  that  all  four  of  the  above 
methods  can  be  considered  for  the  more  general  measurement 
model: 

y  =  Ax-fv,  (17) 

where  A  is  a  convolution  operator.  The  cost  function 
that  we  wish  to  iteratively  refine  now  would  become 
argminx  {^lly  —  Axp  +  J(x)}.  By  a  simple  substitution 
of  iT(x,y)  =  i||y  —  Ax|p  into  the  general  forms  of  the 
four  iterative  regularization  methods  presented  above,  we  can 
derive  the  following: 

1)  Xk+i  =  B  (y +  -  Axi)y 

2)  xfe+i  =  B{y)  +  B  (j2i=i  A^(y  “  Ax,)) , 

3)  Xfe+i  =  B{y)  +  Y!Li  ^  (A(y  -  Xi))>  and 

4)  ife+i  =  B{y)  +  Y,\=i  B{y)  -  X;,=i  AS  (x^). 

Again,  the  equivalence  of  these  forms  can  easily  be  checked 
by  comparing  their  first-derivatives. 

IV.  Bias- Variance  Tradeoff 


To  measure  the  effectiveness  of  the  algorithms,  the  mean- 
D.  Method  4:  Iterative  Unsharp  Regularization  squared  error  (MSB)  is  a  natural  choice.  The  MSB  is  defined 

The  process  of  unsharp  masking  is  a  well-known  technique  as  '  1  '  2I 

[7].  The  process  consists  of  subtracting  a  blurred  version  of  mse(6)  =  E  \(6  —  6)  (18) 
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Total  Variation 

Bilateral 

Method  1 

Figure  12  (a) 

Figure  13  (a) 

Method  2 

Figure  12  (b) 

Figure  13  (b) 

Method  3 

Figure  12  (c) 

Figure  13  (c) 

Method  4 

Figure  12  (d) 

Figure  13  (d) 

TABLE  II 

The  different  examples  we  present  here.  The  lowest  MSE 

RESULT  IN  EACH  COLUMN  IS  ITALICIZED. 


where  6  is  the  estimate  and  9  is  the  underlying  signal.  We  can 
rewrite  the  MSE  as 


=  E 


mse{9)  =  E 

{o-E{e)) 


(^e-E{9)'^  +  (E{e)-e 

(e-E{e))  {E{e)-e) 


2  E 


+E 


[Ei§)-9y 


var{9)  +  0  +  (^E{9)  —  9^  =  var{9)  +  bias^{9). 


Thus,  as  is  well-known,  MSE  is  the  sum  of  the  estimate 
variance  and  squared-bias  [8]. 

Ref.  [9]  provides  a  bias-variance  tradeoff  analysis  for  L2 
boosting  (which  is  equivalent  to  the  ITR  method),  however, 
some  of  the  key  assumptions  made  in  that  analysis  do  not 
apply  to  the  iterative  regularization  methods  that  we  present 
here  (namely,  in  our  general  analysis  B{-)  is  a  non-linear 
operator).  In  the  next  section,  we  present  an  experimental  bias- 
variance  tradeoff  analysis  of  the  four  iterative  regularization 
methods  that  we  have  discussed  here. 


V.  Experiments 

Table  II  provides  a  brief  summary  of  the  combinations 
of  iterative  regularization  methods  and  functionals  that  we 
illustrate  in  this  section. 


A.  Bias-Variance  Tradeoff 

In  Figure  6  we  can  see  the  MSE,  variance,  and  squared- 
bias  of  Osher’s  iterative  regularization  method  as  a  function  of 
iteration  number.  The  image  used  for  this  simulation  is  shown 
in  Figure  7.  In  Figure  6  (a),  we  use  the  Bilateral  cost  ([10],  [3]) 
in  place  of  J(x)  to  carry  out  the  cost  function  minimization 
with  parameters  N  =  2,  Od  =  IT,  and  ar  =  35;  in  (b)  we 
use  the  Total  Variation  cost  ([2])  with  A  =  .18  and  50  steepest 
descent  iterations.  Notice  that  the  squared-bias  decreases  as 
we  iterate  but  the  variance  increases.  The  mean-square  error 
optimal  estimate  occurs  where  the  sum  of  these  two  values 
is  at  a  minimum.  The  MSE,  variance,and  squared-bias  have 
been  calculated  via  Monte-Carlo  simulation  (using  50  noise 
realizations  for  each  method). 

For  all  of  the  iterative  regularization  methods  presented 
here,  the  bias  of  the  first  iteration  of  the  estimate  is  expected 
to  be  largest.  This  is  especially  true  in  the  case  of  the  Total 
Variation  and  Bilateral  functionals  because  we  are  using  cost 
functions  that  assume  that  the  underlying  image  is  piece-wise 
constant  [2],  [10],  [3].  Thus  the  first  estimate,  Xi  =  B{y) 


Osher’s  method  Osher’s  method 


iterations 


iterations 


(a)  Bilateral 


(b)  Total  Variation 


Fig.  6.  MSE,  variance,  and  squared-bias  of  the  estimates  of  the  noisy 
image  shown  in  Figure  7  (b)  using  Osher’s  iterative  regularization  method 
(with  (a)  Bilateral  and  (b)  Total  Variation  regularization  functionals). 


Fig.  7.  (a)  The  original  ’Barbara’  image  (b)  ’Barbara’  with  added  while 

Gaussian  noise  of  variance  29.5  {PSNR=  33.43dB) 


is  a  piece-wise  constant  version  of  the  image.  That  is  to  say 
that  much  of  the  high-frequency  detail  in  the  image  has  been 
removed  along  with  most  of  the  noise  causing  the  image 
estimate  to  appear  piece-wise  flat.  See  Figure  8  for  an  example. 


(a)  with  texture  (b)  piece-wise  constant 

Fig.  8.  Both  the  Bilateral  Filter  and  the  Total  Variation  filter  make  an 
assumption  of  an  underlying  piece-wise  constant  image.  Thus  the  estimates 
that  result  from  these  processes  have  a  piece-wise  flat  appearance,  (a) 
Subsection  of  the  ’Barbara’  image  (b)  The  result  of  applying  Total  Variation 
to  (a). 


As  we  continue  to  iteratively  refine  our  general  cost  function 
(4)  we  begin  to  add  back  some  of  that  lost  texture,  thus  the 
squared-bias  begins  to  decrease.  However,  since  no  method  is 
perfect,  we  do  get  a  bit  of  noise  added  back  as  well;  this  causes 
the  variance  to  increase.  At  some  point  in  the  iterative  process 
we  get  the  best  tradeoff  of  restored  texture  and  suppressed 
noise;  this  is  our  optimal  MSE  estimate. 

We  achieve  similar  results  for  the  ITR  and  lUR  methods,  as 
can  be  seen  in  Figures  9,  10  and  11.  The  MSE,  variance, and 
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squared-bias  for  each  of  the  methods  have  been  calculated  via 
Monte-Carlo  simulation  (using  50  noise  realizations  for  each 
method).  The  Bilateral  Filter  parameters  were  chosen  to  yield 
the  best  MSB  for  each  of  the  methods. 


MSE  of  estimates 


MSE  of  estimates 


2  4  6  8  1( 

iterations 


2  4  6  8  10 

iterations 


(a)  Bilateral 


(b)  Total  Variation 


we  obtain  the  estimate  with  the  lowest  mean-squared  error  for 
each  of  the  iterative  regularization  methods.  In  all  four  iterative 
regularization  methods,  50  steepest-descent  steps  were  used  to 
minimize  the  cost  function  at  each  iteration  of  (4).  The  values 
of  A  used  for  method  1,  method  2,  method  3,  and  method  4 
respectively  were:  A  =  .18,  A  =  .21,  A  =  .7,  and  A  =  .32. 
The  best  MSE  estimates  produced  by  methods  1  -4,  are  shown 
in  Figure  12  (a),  (b),  (c),  and  (d)  respectively.  The  best  way 
to  see  the  subtle  differences  between  the  methods  is  to  look  at 
the  residual  |y  —  Xfc|,  thus  these  are  shown  as  well.  A  residual 
that  contains  less  structure  and  looks  more  like  pure  noise  is  an 
indication  of  a  better  denoising  algorithm.  The  MSE,  variance, 
and  squared-bias  of  these  examples  correspond  to  the  plots  in 
Figures  6  (b),  9  (b),  10  (b),  and  11  (b). 


Fig.  9.  MSE  of  the  estimates  of  the  noisy  image  shown  in  Figure  7 
(b)  using  iterative  regularization  methods  1-4  (with  (a)  Bilateral  and  (b)  Total 
Variation  regularization  functionals). 
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Fig.  10.  Variance  of  the  estimates  of  the  noisy  image  shown  in  Figure  7 
(b)  using  iterative  regularization  methods  1-4  (with  (a)  Bilateral  and  (b)  Total 
Variation  regularization  functionals). 
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Fig.  11.  Squared-bias  of  the  estimates  xj,  of  the  noisy  image  shown  in 
Figure  7  (b)  using  iterative  regularization  methods  1-4  (with  (a)  Bilateral  and 
(b)  Total  Vaiiation  regularization  functionals). 


B.  Iterative  Regularization  Using  Total  Variation  Functional 
For  this  experiment  we  add  white  Gaussian  noise  of  variance 
(T^  =  29.5  to  the  image  ’Barbara’  shown  in  Figure  7  (a). 
The  resulting  noisy  image  has  a  PSNR  of  33.43dB  and  is 
shown  in  Figure  7  (b).  For  this  experiment  we  have  selected  the 
Total  Variation  Filter  [2]  which  has  some  control  parameters 
that  determine  the  filter  weights.  These  parameters,  as  well 
as  our  regularization  parameters,  are  tuned  by  hand  until  the 


C.  Iterative  Regularization  Using  Bilateral  Functional 

We  repeat  the  same  procedure  as  the  previous  experiment 
using  the  Bilateral  Filter  instead  of  the  Total  Variation  Filter. 

The  Bilateral  Filter  has  three  user  defined  parameters:  kernel 
size  N,  geometric  spread  ad,  and  photometric  spread 
We  list  the  values  for  these  parameters  that  we  used  in  this 
experiment  for  completeness  and  refer  the  reader  to  [10] 
and  [3]  for  more  information  on  the  Bilateral  Filter.  In  all 
four  iterative  regularization  methods  we  used  N  =  2  and 
ad  =  1.1.  The  values  of  ar  used  for  method  1,  method  2, 
method  3,  and  method  4  respectively  were:  ar  =  35,  ar  =  31, 
ar  =  14,  and  ar  =  23.  Since  the  Bilateral  Filter  minimizes  it’s 
associated  cost  function  in  one  iteration  ([3])  we  only  need  to 
apply  the  Bilateral  once  per  iteration  in  each  of  the  iterative 
regularization  methods.  The  best  MSE  estimates  of  the  four 
iterative  regularization  methods  and  their  residuals  are  shown 
in  Figure  13.  The  MSE,  variance,  and  squared-bias  of  these 
examples  correspond  to  the  plots  in  Figures  6  (a),  9  (a),  10 
(a),  and  11  (a). 


VI.  Conclusions 

Denoising  algorithms  that  can  be  formulated  as  in  (2),  such 
as  the  Bilateral  Filter  and  Total  Variation  Filter,  are  frequently 
used  due  to  their  ease  of  implementation  and  effectiveness.  We 
have  shown  that  the  iterative  regularization  methods  which  we 
present  here  can  improve  on  the  results  of  these  algorithms. 

Method  2  appears  to  give  the  best  results  in  the  experiment 
where  the  Total  Variation  functional  is  used.  However,  lUR 
gives  better  results  in  the  Bilateral  experiment.  Clearly  the 
different  iterative  regularization  methods  are  useful  and  the 
’’best”  method  to  use  can  vary  depending  on  the  regularization 
functional  and  possibly  even  the  particular  image.  This  leaves 
much  room  for  further  investigation  of  the  exact  relationship 
between  these  different  methods. 
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